Historic samples reveal loss of wild genotype through domestic chicken introgression during the Anthropocene

Human activities have precipitated a rise in the levels of introgressive gene flow among animals. The investigation of conspecific populations at different time points may shed light on the magnitude of human-mediated introgression. We used the red junglefowl Gallus gallus, the wild ancestral form of the chicken, as our study system. As wild junglefowl and domestic chickens readily admix, conservationists fear that domestic introgression into junglefowl may compromise their wild genotype. By contrasting the whole genomes of 51 chickens with 63 junglefowl from across their natural range, we found evidence of a loss of the wild genotype across the Anthropocene. When comparing against the genomes of junglefowl from approximately a century ago using rigorous ancient-DNA protocols, we discovered that levels of domestic introgression are not equal among and within modern wild populations, with the percentage of domestic ancestry around 20–50%. We identified a number of domestication markers in which chickens are deeply differentiated from historic junglefowl regardless of breed and/or geographic provenance, with eight genes under selection. The latter are involved in pathways dealing with development, reproduction and vision. The wild genotype is an allelic reservoir that holds most of the genetic diversity of G. gallus, a species which is immensely important to human society. Our study provides fundamental genomic infrastructure to assist in efforts to prevent a further loss of the wild genotype through introgression of domestic alleles.


Introduction
Human activities over the last few hundred years have exerted an unprecedented impact on the environment and biodiversity [1][2][3][4][5][6]. Among the least known repercussions of the modern environmental crisis is a marked increase in anthropogenically mediated genetic introgression between animals [7][8][9]. As urbanization and habitat loss accelerate [7,10], biotic distribution ranges contract or expand, placing species that had previously been isolated by natural barriers into contact, and leading to a rise in introgressive gene flow [9,11]. The documentation of this increase in genetic admixture remains in its infancy as the investigation of historic DNA samples has posed numerous challenges [12,13]. A successful comparison of genomes at different time points may provide insights into the long-term effects of human-induced introgression.
Red junglefowl (Gallus gallus) are the wild, ancestral form of the chicken. The origin of their domestication has been placed in Indochina and/or southernmost China [14] within the range of the junglefowl subspecies spadiceus, but little is known about the spread of domestication and subsequent genetic exchange between wild junglefowl and domestic chickens. While the discovery of domestication-specific alleles suggests continual and fairly strong reproductive isolation between chickens and wild junglefowl [14,15], signatures of genomic admixture indicate that the domestic-wild divide has not always been absolute: several studies have established gene flow from domestic chickens into wild junglefowl, although the opposite scenario remains less well documented [16][17][18][19][20][21]. Although domestic introgression can be beneficial to populations of a wild ancestral species [22,23], preserving wild genotypes of important human domesticates is generally regarded as crucial to preclude sweeps of genomic homogenization and/or a loss of the full range of wild-type genetic diversity [24,25]. Variation within the ancestral population is capable to confer advantages to the domestic population [26]. As such, a loss of wild-type genetic diversity in junglefowl may hinder the safeguarding of one of humanity's most important food sources. Yet the ongoing global environmental crisis may have promoted an increase of domestic introgression into junglefowl [19], with suggestions that truly unadmixed wild red junglefowl may not have survived when we entered the Anthropocene during the start of the industrial revolution [27].
To investigate the extent of domestic introgression in modern red junglefowl populations, we compared the genomes of 51 chickens with 63 junglefowl-most of the latter dating back >100 years. We ensured a representation of all five extant subspecies of wild G. gallus in our sampling, and maximized geographic coverage from across the entire historic distribution of the species (Fig 1A and S1 Table). All in all, there are three main groups being compared in this study-(1) historic red junglefowl from a century ago (S1 Table), (2) modern red junglefowl collected at the beginning of this century, and (3) domestic chickens collected in the same time period as the previous. By using genomes of red junglefowl from a century ago as reference points for the wild genotype, hailing from a time when there was a larger area of wild habitats and fewer chickens to interbred with, we quantified excess domestic introgression in modern wild populations. We then identified multiple domestication markers in which chickens are deeply differentiated from wild junglefowl, regardless of breed or geographic provenance, with eight genes under selection. The latter are involved in pathways dealing with the nervous system, reproduction and vision. Our study sheds light on the impact of human-influenced introgression on wild junglefowl and provides fundamental genomic infrastructure to assist future efforts to maintain a substantial pool of their wild genotype.

Results
We sequenced the genomes of 45 historic red junglefowl, with a total of four historic samples (1 bankiva, 1 murghi, 2 spadiceus) dropping out due to high missingness or severe post mortem damage. The final quality filtered reads were around 89 bp long on average, with an average genomic coverage of 7.5X.

Deepest genomic division is between wild red junglefowl and chickens
Various analytical approaches indicated that the deepest population-genomic divergence in G. gallus is between chickens and their wild counterparts (Figs 1C and S1) [19,28]. Within our historic wild G. gallus samples, the subspecies bankiva forms a distinct cluster from the other subspecies (S1 and S2 Figs). For the remaining taxa, there seems to be an overall clinal geographic structure from Indian murghi to Southeast Asian spadiceus, with the latter being closer to domestic chickens (Figs 1C and S2). This result emerged irrespective of the inclusion of chickens from extremely different geographic backgrounds and pedigrees (S1 Fig and S2  Table). The phylogenomic tree constructed from >1 million genome-wide markers found all domestic individuals forming one well-supported clade within the G. gallus complex (S1 Fig).
Using various approaches, we did not find any historic artifact biasing our data (S2 and S3 Figs).

Historic DNA reveals higher domestic introgression in modern wild junglefowl versus pre-Anthropocene samples
Principal component analysis (PCA) of almost 3 million genome-wide markers suggests that present-day wild populations possess more domestic alleles compared to populations from a century ago (Fig 1D). Here, we limited our analysis to only G. g. spadiceus, the putative ancestral subspecies domestic chickens originated from [14], to reduce artifacts of population structure from other subspecies (S2 Fig). Observed heterozygosity (H) has decreased drastically and Tajima's D values have become positive from historic (H = 0.0470, Tajima's D = -1.15996) to modern G. g. spadiceus (H = 0.00464, Tajima's D = 0.470075), reflecting a loss of allelic diversity in modern G. g. spadiceus. The corresponding statistics for our panel of domestic chickens were mostly similar to modern G. g. spadiceus (H = 0.00503, Tajima's D = -9.32).
Within G. g. spadiceus, modern samples seem to exhibit a clinal population structure from the south (Singapore) to the north of the subspecies range (northeastern India), with seemingly increasing domestic contributions (Figs 1D and S2). The purest examples of modern freeroaming junglefowl exhibit a genomic profile that is almost fully embedded with the historic wild population, while other modern free-roaming samples from throughout the range of spadiceus have profiles that cluster much more closely with domestics ( Fig 1D), indicating substantial heterogeneity in the level of introgression, sometimes even within modern wild populations.
We quantified levels of excess domestic introgression into modern spadiceus populations by computing D-statistics using the topology (((spadiceus-historic, spadiceus-modern), chicken), Chinese bamboo partridge Bambusicola thoracicus) [29]. We carried out multiple Dstatistics calculations using different subsets of populations across the range of spadiceus to ensure that population structure did not affect our tests. While these analyses are unable to evaluate the level of domestic introgression that was already present in pre-Anthropocene wild red junglefowl from one century ago, we showed that modern populations of the subspecies spadiceus exhibit a significant excess of such introgression as compared to populations from just before the Anthropocene (Fig 1B and S3 Table).
When inferring domestic versus wild ancestry across around 2.7 million SNPs using the program Struct-f4 [30], a pattern emerged in which a substantial proportion of modern individuals from throughout the range of spadiceus were characterised by a large amount of domestic introgression, reflecting our PCA results (Figs 1D and 2A). Global ancestry inference using the same program found most modern red junglefowl to be distinct from the other populations, with contributions from domestic chickens (20-50%) and historic red junglefowl (45-80% in the four genomically most "wild-type" junglefowl individuals from Singapore) ( Fig 2B). Local ancestry inference with almost 150,000 single nucleotide polymorphisms (SNPs) using the program EILA (Efficient inference of local ancestry) [31] found the Singaporean population broadly heterogeneous, with percentages of domestic-introgressed alleles ranging from 5-97% depending on individual ( Fig 2C). All individuals screened from northern Thailand (Chiang Mai), northeast India (Manipur) and southwest China (Yunnan) displayed percentages of domestic-introgressed alleles between 83%-96% ( Fig 2C). When restricting EILA analysis to using only commercial farm breeds as the domestic reference, a similar trend emerged, but the percentages of domestic contribution fell to 30-70% (S4 Table and  While the accuracy of estimating local ancestry sources in the genome remain contentious, the trend obtained from our EILA analysis is reflective of our global ancestry estimation.

Identification of possible domestication loci under selection in chickens
Following from our evidence that domestic introgression into wild red junglefowl has increased in modern times, we sought to find genomic areas of elevated divergence between chickens and historic junglefowl. We performed scans of genomic divergence across the chicken reference genome (RefSeq Assembly Accession: GCF_000002315.6) in 50kb sliding  [83]. Circles refer to collection localities for the samples used in this study. The numbers next to the black circles are the sample size for each modern red junglefowl population (sample size was 1 for all red circles). The distribution of subspecies spadiceus, the putative ancestor of modern chickens, is encircled by a stippled line, and divided into its northern and southern range as defined in this work. (B) D-statistics testing for excess allele sharing between domestic chickens and red junglefowl of the ancestral subspecies spadiceus using the phylogeny (((spadiceus-historic, spadiceus-modern), chicken), windows using D XY as a measure of differentiation, comparing domestics to wild historic samples of each of the five red junglefowl subspecies (Fig 3). We also used F ST as another measure of differentiation, but preliminary plot inspection indicated an excess of potential domestication loci which may be an artifact of the difference in sample size among populations (S5 Fig). We chose a wide range of domestics from farm breeds to village chickens to reduce signals associated with post-domestication breed formation and adaptive differentiation. Most genomic divergence between domestics and wild individuals was at a roughly constant background level, occasionally disrupted by so-called 'islands of extreme genomic divergence' [32,33]. We harvested regions with D XY sliding window values that exceed the 99 th percentile for each pairwise comparison and in this way identified 883 potential domestication-related genic regions. These loci were tested for positive selection using aBSREL with seven outgroups across the avian phylogeny [34]. A total of 112 genes were found to be under selection in the Gallus gallus complex, with a subset of eight solely in domesticated chickens not including selection signals in the outgroups (Figs 3 and S6 and Tables 1 and S5). Genes identified as deeply divergent and under selection between domestics and historic wild samples of various subspecies play an important role in such processes as vision maintenance (CERKL [35]), reproduction (CFAP97
By comparing the genome position of the eight genes under selection (Table 1) with our local ancestry results (Fig 2C), we deduced the most likely origin of these genes in the 15 modern red junglefowl individuals included. Out of the eight genes, we were unable to infer the local ancestry for MUS81 and GPATCH8 due to a lack of SNPs in the surroundings of their genomic location in EILA. For the other six genes, the majority of all 15 modern red junglefowl screened possessed a copy of domestic ancestry (SLTM-73.3%, CFAP97-80%, CAPS2-66.7%, C2CD5-60%, DYNC2H1-66.7%) with only one exception (CERKL-20%), underpinning the veracity of our approach of using historic red junglefowls as a reference. If these modern individuals had been chosen for domestic selection scans, these genes would not have been flagged as potential candidates as they have already been homogenized in modern junglefowl via domestic introgression.

Increasing levels of domestic introgression threaten the modern wild genotype of G. gallus
The deep genetic rift between domestics and all wild junglefowl may appear unusual because the different subspecies of red junglefowl have had hundreds of thousands of years to differentiate, whereas domestication is only thought to date back approximately 4,000 to 10,000 years [27, 43,44] (Figs 1 and S1). This rift is indicative of pronounced long-term gene flow among all adjoining continental subspecies of the red junglefowl into historic times, preventing the formation of any deep population-genomic barriers between wild populations. Furthermore, this domestic-wild rift is likely an effect of a strong bottleneck during domestication similar to other domestic animals [45,46]. This clear distinction between our historic G. gallus individuals and modern chickens suggests that there may have been some form of spatial isolation between chickens and the historic wild junglefowl selected for this study, making the latter an appropriate representative of the wild genotype in further comparisons.
The substantial increase in domestic introgression over the span of only a few decades does not bode well for safeguarding the wild allelic diversity of G. gallus [20,47,48]. This intensification of domestic admixture is likely driven by anthropogenic change to the native environment of G. gallus, which has experienced a high level of habitat loss following the relentless human encroachment into tropical Asia's last wilderness areas in recent decades [49][50][51]. While the increase in domestic introgression is heterogenous throughout the range, all modern populations will-by now-be in proximity to released domestic chickens that are free to interbreed with them. Our data suggest that intensifying introgression from highly homogenised domestic stock into the diverse wild stock has led to an overall loss of heterozygosity by an order of magnitude. This result may be counter-intuitive at first, given that outbreeding and introgression are theoretically expected to lead to increasing heterozygosity. However, chickens, despite their great phenotypic diversity and population size, are known to have undergone an extreme bottleneck and are genomically impoverished akin to other domestic animals, as reflected by their low heterozygosity and Tajima D values in our study. It is likely the introduction of highly homogenised domestic alleles into the wild red junglefowl gene pool that has caused wild-type genetic diversity to plummet so precipitously. As a caveat, our conservative assignment of some samples of questionable provenance to 'domestic village chickens' (see methods) may have led to an overestimate of the absolute level of domestic introgression. Estimation of local ancestry using only commercial farm samples reduced the domestic contribution by about 30% in some samples (S4 Table). However, this potential bias is incompatible with the stark contrast between the clear genomic division between historic junglefowl and chickens as compared to the more blurred division when modern junglefowl are added (Fig 1), which is clearly due to excess introgression in modern samples. A previous study has found a correlation between plumage and genomic makeup in a chicken-junglefowl cline [19], and while we have photographic evidence only for some of the modern red junglefowl included in this study, this evidence suggests that the samples that cluster closer to domestics do exhibit some phenotypic differences from the ancestral wild phenotype (S7 Fig). An increase in domestic introgression would allow the wild populations to continue to persist. Indeed, there are documented cases of adaptive introgression aiding wild ancestral populations of domesticated animals in their survival (reviewed in [52]). At the same time, it is unknown whether such a positive evolutionary development would apply in junglefowl. More localized studies on chicken-junglefowl interactions [19] and studies concentrating on single traits [21] have shown that at least some of the domestic admixture into junglefowl may fairly be characterised as adaptive introgression. However, an extensive phenotypic dataset combined with genomic data would be required to conclusively address this question. In the meantime, the steep reduction in genetic diversity following in the wake of domestic introgression (see above) raises concerns that the admixture process may lead to a dangerous reduction of allelic diversity in the wild type.

The loss of domestication markers due to domestic introgression
The quest for domestication markers is important from two perspectives: (1) it highlights genes and other genomic motifs which render chickens what they are, and which help us define the most important traits underlying domestication; (2) it assists in future attempts to diagnose wild populations which have experienced little admixture with domestic alleles and retained some of the most important genomic architecture characterizing the wild genotype. The search for these loci has been a hot biological topic for decades [14,15,25] and has routinely involved comparisons between modern wild and domestic populations [14,15,25]. While some domestication markers selected for in domestics may be selected against in wild population, allelic homogenization following extensive domestic introgression into modern wild populations could mask many other loci that have played an important role in the original domestication process. The high proportion of modern red junglefowl possessing the domestic copy of genes under selection in domestic chickens is consistent with such a scenario (Table 1 and Fig 2). While we do not possess ancient genomes of red junglefowl, genes identified using historic G. gallus samples as opposed to modern ones would suffer this problem to a much lower extent due to their less pronounced levels of introgression. As such, the analysis of historic G. gallus samples is crucial to shedding light on the differences between the domestic chicken and its wild counterpart.

Genes involved in the nervous system are dissimilar between red junglefowl and chickens
Our study identifies four genes involved in the nervous system that are selected for in chickens but not their wild counterpart. Both DYNC2H1 and SLTM code for proteins involved in the Sonic hedgehog pathway that are critical in the formation of organs and the central nervous system during development by regulating neural crest cells [53][54][55]. We also identified two other genes with relevance to the nervous system-CAPS2 modulates synaptic activities [39] and C2CD5 regulates body weight and appetite in the hypothalamus [38]. These properties are not surprising given that tameness is one of the main characteristics being selected for during domestication [56,57].
Chickens are known to have poorer vision compared to their wild counterparts [58], with a handful of vision genes under positive selection that have been proposed to cause this phenotype [59]. Here we add another vision gene, CERKL, that plays a role in reducing oxidative damage on the retina, which is equally under positive selection [35]. Interestingly, wild spadiceus also seem to be experiencing positive selection in this gene, either due to (1) introgression of the domestic copy of CERKL into wild spadiceus, or (2) because spadiceus may naturally be endowed with a weaker vision.
A previous study suggested that TSHR, a gene encoding for the thyroid-stimulating hormone receptor protein, could be a domestication marker based on its near fixation in chickens compared to G. g. spadiceus [15], a result contradicted by later research suggesting that this pattern is merely a by-product of increasing allele frequency in wild populations [14]. This latter gene is also not flagged in our analyses.
Using genomes of historic specimens from throughout the range of red junglefowl, we were able to (1) demonstrate an intensification of introgression from chickens into modern red junglefowl across the Anthropocene and (2) identify eight candidate genes under selection in chickens. Given that our historic red junglefowl samples exhibited less domestic admixture compared to modern fowl, we were able to identify possible genes in our selection scan that would remain intractable in comparisons involving only modern junglefowl, as the latter's genomes are frequently homogenized due to extensive domestic introgression. These candidate genes will be critical for future genomic enquiries to add to our understanding of the functional differences between domestic chickens and wild junglefowl.

Population sampling
We obtained 45 dried toepad samples of G. gallus collected between 1874 and 1939 across its native range preserved at the Natural History Museum at Tring (UK) and Lee Kong Chian Natural History Museum (Singapore) (Fig 1 and S1 Table). The level of domestic introgression in historic samples is likely lower compared to modern-day samples given that habitat encroachment in their native range was less severe in the past, allowing us to use them as a reference point to quantify excess modern introgression. We also incorporated an additional 69 samples of red junglefowl and various chicken breeds available from GenBank, including eight free-roaming G. g. spadiceus from Singapore, seven farm individuals and two historic samples from peninsular Malaysia from Wu et al. (2020) [19] along with 8 G. g. spadiceus from Wang et al. (2020) [14] (S2 Table), adding them to our dataset of 45 historic G. gallus for a total of 114 samples. We additionally included sequences of one Bambusicola thoracicus individual from GenBank as an outgroup [60] (S2 Table), leading to a total dataset of 115 samples. It is difficult to determine the origin of red junglefowl samples from online databases without phenotypic validation. Therefore, we have opted in favor of a conservative approach whereby samples labelled G. gallus of uncertain but likely domestic origin were labelled as village chickens (S2 Table) and only kept those samples confidently labelled as modern red junglefowl.

Data generation and basic processing
DNA extraction, library preparation and whole-genome resequencing. Our laboratory protocol was designed to accommodate low-quality DNA extracted from the toepads of historic specimens, and to account for sources of potential DNA contamination. We extracted genomic DNA of toepad material from the 45 historic specimens using the DNEasy Blood & Tissue Kit (Qiagen, Hilden, Germany) along with a negative control. Subsequently, we created paired-end libraries using the NEBNext FFPE DNA Repair Mix (New England Biolabs, Massachusetts, United States) and the NEBNext Ultra DNA Library Prep Kit for Illumina (New England Biolabs) with the extraction negative control and an additional library negative control. Both protocols were performed under sterile conditions in a dedicated ancient DNA facility with modifications following Chattopadhyay et al. [61].
DNA concentration for each extract and library was quantified using a Qubit 2.0 High Sensitivity DNA assay (Invitrogen, Carlsbad, USA) and sequence length of each library was visualized with an AATI Fragment Analyzer (Advanced Analytical Technologies, Ankeny, USA). We did not detect any library peak in the AATI profiles of the negative controls. Once checked, libraries were sequenced at NovogeneAIT Genomics (Singapore) on an Illumina HiSeq 4000 platform to produce 150 bp paired-end reads.
Quality filtering. Initial quality assessment of raw reads was carried out using FastQC (Babraham Bioinformatics, USA).
For all 115 individuals, adaptor trimming was carried out using cutadapt v2.3 [62]. Historic DNA from museum samples is prone to exogenous contamination, such as from bacteria and humans [12]. We removed potential contamination after adaptor removal by mapping our reads to three reference genomes-(1) human (GenBank Assembly Accession: GCA_000001405.28), (2) chicken (RefSeq Assembly Accession: GCF_000002315.6), (3) and a compilation of all available bacterial genomes on the RefSeq database. We extracted reads that mapped uniquely to the chicken reference genome in FastQ_Screen v0.13.0 before continuing [63]. These filtered reads were mapped to the chicken reference genome (RefSeq Assembly Accession: GCF_000002315.6) using BWA-MEM [64]. Low quality reads (MAPQ score <20) were filtered with SAMtools v1.9 to ensure unique mapping [65]. Picard v2.20.2 (http:// broadinstitute.github.io/picard/) was subsequently used to assign read group information and mark duplicates. The original alignments were then realigned and refined using RealignedTar-getCreator and IndelRealigner as implemented in the Genome Analysis Toolkit v3.8-0 (Broad Institute, USA) [66]. Lastly, we rescaled the quality scores of all historic samples using a Bayesian statistical model of DNA damage as implemented in MapDamage 2.0 to account for post mortem DNA damage [67]. We found elevated C to T substitutions in the first few bases of read 1 and elevated G to A substitutions in the last few bases of read 2 in the MapDamage report. We proceeded to trimming the first and last five base pairs off each sample's sequences to ensure the quality of our data before further analysis. The final bam files were checked in Qualimap v2.2.2 for mapping quality and sequencing bias before variant calling [68]. We initially also added 44 modern samples of various subspecies of G. gallus from Wang et al. (2020) [14] but found them, specifically murghi due to much lower coverage, to exhibit a puzzling signal of population structure far outside all our other samples and therefore opted to remove them to preclude artifacts (S2 Fig). Population genomic approaches. We explored the possibility of historic artifacts based on the degraded nature of our samples by projecting the variation of historic samples on the basis of the higher quality modern samples using smartpca in EIGENSOFT (http://www.hsph. harvard.edu/alkes-price/software/) (S2 and S3 Figs). We did not find any historic artifacts biasing our data. Therefore, we then used ANGSD v0.923 (settings: -uniqueOnly 1, -remove_bads 1, -only_proper_pairs 1 -SNP_pval 1e-6, -minMapQ 30 -minQ 30 -minMaf 0.03 -minInd n -minIndDepth 3 -geno_mindepth 3) for SNP capture on different G. gallus datasets. This software has been specifically designed to conduct population genetic analysis with low coverage genome data, and was hence deemed suitable for the nature of our historic samples [69,70]. For initial data exploration, a concatenated SNP tree was created in RAxML with Lewis correction using the GTRGAMMA model with mostly domestic and historic wild samples, and four modern samples as representatives [71]. We also conducted PCA on the G. gallus samples to assess population subdivision using PCAngsd [72]. We found the PCA to be polarized by the Javan subspecies G. g. bankiva (n = 5) (S2 Fig). We also found certain samples of commercial breeds and village chickens (n = 22) to exert high polarization on the visualization when added to the dataset: these latter samples, while still within the domestic cluster, scattered widely and illustrated population structure within the cluster. Moreover, we found an additional seven samples to display high missingness at called loci: five historic wild, one modern wild and one domestic sample from GenBank. To achieve a clearer spread across G. gallus subspecies, we removed these subpar individuals together with the samples of G. g. bankiva (n = 5), modern G. gallus (n = 15) and the selected commercial breeds (n = 22) prior to conducting a PCA with only historic G. gallus with the same filters (n = 61; 7,492,873 SNPs). We also conducted PCA on only G. g. spadiceus with the following filters: uniqueOnly 1, -remove_bads 1, -only_pro-per_pairs 1 -SNP_pval 1e-6, -minMapQ 30 -minQ 30 -minMaf 0.03 -minInd 46 -minI-ndDepth 3 -geno_mindepth 3 (n = 62; 2,709,190 SNPs). We only used a subset of domestic breed individuals for the PCAs to retain valuable analytical PCA space for the red junglefowl samples of interest. For the latter analyses, we included all breeds available to represent domestic chickens (n = 50).
We additionally conducted Bayesian clustering in STRUCTURE with a random subset of 100,000 SNPs, employing the wrapper Structure_threader v1.2.4 to parallelize all runs on the G. gallus dataset (n = 61 + 4 modern spadiceus) [73,74]. Clustering analysis was run for K = 1 to 10 with 10 replicates for each K, a burn-in of 100,000 generations and 500,000 further Monte-Carlo Markov Chain (MCMC) generations. To aggregate replicates for each K, we ran the STRUCTURE output through CLUMPAK using the FullSearch algorithm for K = 1 to 4 and Greedy algorithm for K = 5 to 10 [75].
We tested for excess allele sharing between G. g. spadiceus and the domestic population by applying pairwise ABBA-BABA tests as implemented in ANGSD with error correction and ancient transition removal [76]. We used B. thoracicus as an outgroup because other members of the genus Gallus may have introgressed with G. gallus [15,77,78]. The tree topology tested was thus (((RJF1, RJF2), domestics), outgroup), where RJF refers to either modern or historic G. g. spadiceus from various localities.
We inferred global ancestry using 2,603,726 SNPs from the 15 modern G. g. spadiceus samples using Struct-f4 individually [30]. Struct-f4 considers allele sharing, population-specific drift and potential inbreeding, all of which may distort classical population clustering analyses. We first converted the tped file into multiple 5Mb-long files using Tped2Structf4.pl before running the calc-f4 C script (-n 67). The prior output was used to run struct-f4 (-K 4 -m 200,000) in R. All scripts are included in the Struct-f4 package [30].
To identify specific regions showing excess allele sharing between each G. gallus subspecies and domestic stock, we used D XY as an estimate of differentiation. We first called SNPs that are present in both wild individuals of each G. gallus subspecies and domestics (-uniqueOnly 1, -remove_bads 1, -only_proper_pairs 1, minMapQ 30, -minQ 20, -setMinDepth 3, -SNP_pval 1e-6, -skipTriallelic 1, -minInd n), with ANGSD using B. thoracicus as an ancestral reference. We obtained D XY using popgenwindows.py (available at https://github.com/ simonhmartin/genomics_general) with a 50 kb sliding window and a step size of 10 kb. We identified D XY values above the 99 th percentile of each pairwise comparison, and harvested genes in those regions for all five subspecies for selection tests. Annotated loci or genes present in these selected genomic regions were identified using the chicken reference genome (RefSeq Assembly Accession: GCF_000002315.6). We calculated heterozygosity and Tajima's D using ANGSD -dosaf 1 and realSFS. Tajima's D values were later summarized using thetaStat.